04 Performance Measures

Last compiled: 2020-12-29

Goal

In the previous section, we predicted whether or not a product will be put on ‘backorder’ status using H2O model. We now take the H2O models developed to inspect, visualize, and communicate performance to business stakeholders.

These are some relevant questions to ask ponder:

  • How can I visualize the H2O Leaderboard?
  • How can I generate and work with H2O performance objects?
  • How can I analyze models using ROC and Precision vs. Recall Plots, which are essential for data science model selection?
  • How can I communicate the model benefits using Gain and Lift Plots, which are essential for executive communication?
  • How can I make a model diagnostic dashboard using the cowplot package?

For this, I will be reusing the Product Backorders data set (source of raw data is linked below). You may download the data in case you want to try this code on your own.

Please note this is a continuation of the previous section.

Raw data source:
Download product_backorders.csv

The work performed here is broken down into multiple steps as follows:
  1. Load libraries
  2. Visualize the Leaderboard
  3. Save multiple models for future use
  4. Tune a model with Grid Search
  5. Visualize trade-off between precision, recall and optimal threshold
  6. Plot Receiving Operating Characteristic (ROC) curve
  7. Plot Precision vs. Recall chart
  8. Plot Gain and Lift charts
  9. Develop an H2O Model Metrics dashboard with cowplot package

Step 1: Load libraries

As a first step, please load tidyverse and tidymodels libraries. For details on what these libraries offer, please refer to the comments in the code block below.

# STEP 1: Load Libraries ---
# Tidy, Transform, & Visualize
library(tidyverse)
#  library(tibble)    --> is a modern re-imagining of the data frame
#  library(readr)     --> provides a fast and friendly way to read rectangular data like csv
#  library(dplyr)     --> provides a grammar of data manipulation
#  library(magrittr)  --> offers a set of operators which make your code more readable (pipe operator)
#  library(tidyr)     --> provides a set of functions that help you get to tidy data
#  library(stringr)   --> provides a cohesive set of functions designed to make working with strings as easy as possible
#  library(ggplot2)   --> graphics

library(tidymodels)
# library(rsample)    --> provides infrastructure for efficient data splitting, resampling and cross validation.
# library(parsnip)    --> provides an API to many powerful modeling algorithms in R.
# library(recipes)    --> tidy interface to data pre-processing (making statistical transformations) tools for feature engineering (prior to modeling).
# library(workflows)  --> bundle your pre-processing, modeling, and post-processing together.
# library(tune)       --> helps you optimize the hyperparameters of your model and pre-processing steps.
# library(yardstick)  --> measures the effectiveness of models using performance metrics (metrics for model comparison).
# library(broom)      --> converts the information in common statistical R objects into user-friendly, predictable formats.
# library(dials)      --> creates and manages tuning parameters and parameter grids.

library(h2o)          # H2O modeling
library(ggthemes)     # Better themes for plotting and color palettes
library(glue)         # Implementation of interpreted string literals
library(cowplot)      # Provides various features to help create  publication-quality figures

If you haven’t installed these packages, please install them by calling install.packages([name_of_package]) in the R console. After installing, run the above code block again.

Step 2: Visualize the Leaderboard

# Visualize the H2O leaderboard to help with model selection
data_transformed_tbl <- automl_models_h2o@leaderboard %>%
  as_tibble() %>%
  select(-c(aucpr, mean_per_class_error, rmse, mse)) %>% 
  mutate(model_type = str_extract(model_id, "[^_]+")) %>%
  slice(1:n()) %>% 
  rownames_to_column(var = "rowname") %>%
  # Visually this step will not change anything
  # It reorders the factors under the hood
  mutate(
    model_id   = as_factor(model_id) %>% reorder(auc),
    model_type = as.factor(model_type)
  ) %>% 
  pivot_longer(cols = -c(model_id, model_type, rowname), 
               names_to = "key", 
               values_to = "value", 
               names_transform = list(key = forcats::fct_inorder)
  ) %>% 
  mutate(model_id = paste0(rowname, ". ", model_id) %>% as_factor() %>% fct_rev())

# Perform visualization
data_transformed_tbl %>%
  ggplot(aes(value, model_id, color = model_type)) +
  geom_point(size = 3) +
  geom_label(aes(label = round(value, 3), hjust = "inward"), show.legend = F) +
  scale_color_gdocs() +
  # Facet to break out logloss and auc
  facet_wrap(~ toupper(key), scales = "free_x") +
  labs(title = "Leaderboard Metrics",
       subtitle = paste0("Ordered by: ", "AUC (Area Under the Curve)"),
       y = "Model Postion, Model ID", x = "") + 
  theme(legend.position = "bottom")

Step 3: Save multiple models for future use

# Extracts an H2O model name by a position so can more easily use h2o.getModel()
extract_h2o_model_name_by_position <- function(h2o_leaderboard, n = 1, verbose = T) {
  
  model_name <- h2o_leaderboard %>%
    as.tibble() %>%
    slice(n) %>%
    pull(model_id)
  
  if (verbose) message(model_name)
  
  return(model_name)
  
}

# Save multiple models by extracting from leaderboard
for (num in c(1,2,3,15,16,17)){
  automl_models_h2o@leaderboard %>% 
    extract_h2o_model_name_by_position(num) %>%
    h2o.getModel() %>%
    h2o.saveModel(path = "../03_ml_automated/modeling/h2o_models/")
  }

Step 5: Visualize trade-off between precision, recall and optimal threshold

# Loading top H2O model
stacked_ensemble_h2o <- h2o.loadModel("../03_ml_automated/modeling/h2o_models/StackedEnsemble_AllModels_AutoML_20201229_021944")

performance_h2o <- h2o.performance(stacked_ensemble_h2o, newdata = as.h2o(test_tbl))
typeof(performance_h2o)
## [1] "S4"
performance_h2o %>% slotNames()
## [1] "algorithm" "on_train"  "on_valid"  "on_xval"   "metrics"
performance_tbl <- performance_h2o %>%
  h2o.metric() %>%
  as.tibble() 

performance_tbl %>% 
  glimpse()
## Rows: 400
## Columns: 20
## $ threshold               <dbl> 0.9889336, 0.9819733, 0.9785107, 0.9760445, 0.9714268, 0.9697713, 0.9680889…
## $ f1                      <dbl> 0.01075269, 0.02135231, 0.02836879, 0.04569420, 0.06271777, 0.06944444, 0.0…
## $ f2                      <dbl> 0.006747638, 0.013471037, 0.017945267, 0.029095792, 0.040196516, 0.04462293…
## $ f0point5                <dbl> 0.02645503, 0.05145798, 0.06768190, 0.10638298, 0.14263074, 0.15649452, 0.1…
## $ accuracy                <dbl> 0.8841067, 0.8845266, 0.8849465, 0.8859962, 0.8870460, 0.8874659, 0.8880957…
## $ precision               <dbl> 1.0000000, 0.8571429, 0.8888889, 0.9285714, 0.9473684, 0.9523810, 0.9583333…
## $ recall                  <dbl> 0.005405405, 0.010810811, 0.014414414, 0.023423423, 0.032432432, 0.03603603…
## $ specificity             <dbl> 1.0000000, 0.9997624, 0.9997624, 0.9997624, 0.9997624, 0.9997624, 0.9997624…
## $ absolute_mcc            <dbl> 0.06912713, 0.08855632, 0.10473959, 0.13741716, 0.16387806, 0.17336342, 0.1…
## $ min_per_class_accuracy  <dbl> 0.005405405, 0.010810811, 0.014414414, 0.023423423, 0.032432432, 0.03603603…
## $ mean_per_class_accuracy <dbl> 0.5027027, 0.5052866, 0.5070884, 0.5115929, 0.5160974, 0.5178992, 0.5206019…
## $ tns                     <dbl> 4208, 4207, 4207, 4207, 4207, 4207, 4207, 4206, 4206, 4205, 4205, 4205, 420…
## $ fns                     <dbl> 552, 549, 547, 542, 537, 535, 532, 529, 523, 522, 516, 513, 511, 508, 507, …
## $ fps                     <dbl> 0, 1, 1, 1, 1, 1, 1, 2, 2, 3, 3, 3, 3, 5, 6, 6, 9, 10, 11, 12, 12, 12, 13, …
## $ tps                     <dbl> 3, 6, 8, 13, 18, 20, 23, 26, 32, 33, 39, 42, 44, 47, 48, 53, 54, 56, 60, 64…
## $ tnr                     <dbl> 1.0000000, 0.9997624, 0.9997624, 0.9997624, 0.9997624, 0.9997624, 0.9997624…
## $ fnr                     <dbl> 0.9945946, 0.9891892, 0.9855856, 0.9765766, 0.9675676, 0.9639640, 0.9585586…
## $ fpr                     <dbl> 0.0000000000, 0.0002376426, 0.0002376426, 0.0002376426, 0.0002376426, 0.000…
## $ tpr                     <dbl> 0.005405405, 0.010810811, 0.014414414, 0.023423423, 0.032432432, 0.03603603…
## $ idx                     <int> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 2…
theme_new <- theme(
  legend.position  = "bottom",
  legend.title = element_text(size = 11),
  legend.text = element_text(size = 9),
  legend.key       = element_blank(),
  panel.background = element_rect(fill   = "transparent"),
  panel.border     = element_rect(color = "black", fill = NA, size = 0.5),
  panel.grid.major = element_line(color = "grey", size = 0.333)
)

performance_tbl %>%
  filter(f1 == max(f1))
performance_tbl %>%
  ggplot(aes(x = threshold)) +
  geom_line(aes(y = precision, color = "Precision"), size = 0.5) +
  geom_line(aes(y = recall, color = "Recall"), size = 0.5) +
  scale_color_manual(breaks = c("Precision", "Recall"),
                     values = c("blue", "red")) +
  # Insert line where precision and recall are harmonically optimized
  geom_vline(xintercept = h2o.find_threshold_by_max_metric(performance_h2o, "f1")) +
  labs(
    title = "Precision vs. Recall",
    y = "Value",
    x = "Threshold") +
  theme_new

Step 6: Plot Receiving Operating Characteristic (ROC) curve

load_model_performance_metrics <- function(path, test_tbl) {
  
  model_h2o <- h2o.loadModel(path)
  perf_h2o  <- h2o.performance(model_h2o, newdata = as.h2o(test_tbl)) 
  
  perf_h2o %>%
    h2o.metric() %>%
    as_tibble() %>%
    mutate(auc = h2o.auc(perf_h2o)) %>%
    select(tpr, fpr, auc)
  
}

model_metrics_tbl <- fs::dir_info(path = "../03_ml_automated/modeling/h2o_models/") %>%
  select(path) %>%
  mutate(metrics = map(path, load_model_performance_metrics, test_tbl)) %>%
  unnest(cols = metrics)
model_metrics_tbl %>%
  arrange(desc(auc)) %>%
  mutate(
    # Extract the model names
    PATH = str_split(path, pattern = "/", simplify = T)[,5] %>% as_factor(),
    AUC  = auc %>% round(3) %>% as.character() %>% as_factor()
  ) %>%
  ggplot(aes(fpr, tpr, color = PATH, linetype = AUC)) +
  geom_line(size = 0.75) +
  scale_color_gdocs() +
  # just for demonstration purposes
  geom_abline(color = "black", linetype = "dotted", size = 0.75) +
  theme_minimal() +
  theme_new +
  theme(legend.direction = "vertical") +
  labs(title = "ROC (Receiver Operating Characteristic) Plot",
       subtitle = "Performance of Top 3 & Bottom 3 Performing Models",
       y = "TPR",
       x = "FPR")

Step 7: Plot Precision vs. Recall chart

load_model_performance_metrics <- function(path, test_tbl) {
  
  model_h2o <- h2o.loadModel(path)
  perf_h2o  <- h2o.performance(model_h2o, newdata = as.h2o(test_tbl)) 
  
  perf_h2o %>%
    h2o.metric() %>%
    as_tibble() %>%
    mutate(auc = h2o.auc(perf_h2o)) %>%
    select(tpr, fpr, auc, precision, recall)
  
}

model_metrics_tbl <- fs::dir_info(path = "../03_ml_automated/modeling/h2o_models/") %>%
  select(path) %>%
  mutate(metrics = map(path, load_model_performance_metrics, test_tbl)) %>%
  unnest(cols = metrics)
model_metrics_tbl %>%
  arrange(desc(auc)) %>%
  mutate(
    # Extract the model names
    PATH = str_split(path, pattern = "/", simplify = T)[,5] %>% as_factor(),
    AUC  = auc %>% round(3) %>% as.character() %>% as_factor()
  ) %>%
  ggplot(aes(recall, precision, color = PATH, linetype = AUC)) +
  geom_line(size = 0.75) +
  scale_color_gdocs() +
  theme_minimal() +
  theme_new + 
  theme(legend.direction = "vertical") +
  labs(title = "Precision vs Recall Plot",
       subtitle = "Performance of Top 3 & Bottom 3 Performing Models",
       y = "Precision",
       x = "Recall")

Step 8: Plot Gain and Lift charts

# Table for Gain and Lift plotting
gain_lift_tbl <- performance_h2o %>%
  h2o.gainsLift() %>%
  as.tibble()

## Gain Plot
gain_transformed_tbl <- gain_lift_tbl %>% 
  select(group, cumulative_data_fraction, cumulative_capture_rate, cumulative_lift) %>%
  select(-contains("lift")) %>%
  mutate(baseline = cumulative_data_fraction) %>%
  rename(gain     = cumulative_capture_rate) %>%
  # prepare the data for the plotting (for the color and group aesthetics)
  pivot_longer(cols = c(gain, baseline), values_to = "value", names_to = "key")

gain_transformed_tbl %>%
  ggplot(aes(x = cumulative_data_fraction, y = value, color = key)) +
  geom_line(size = 0.5) +
  scale_color_gdocs() +
  theme_minimal() +
  theme_new +
  labs(title = "Gain Chart",
       x = "Cumulative Data Fraction",
       y = "Gain")

## Lift Plot
lift_transformed_tbl <- gain_lift_tbl %>% 
  select(group, cumulative_data_fraction, cumulative_capture_rate, cumulative_lift) %>%
  select(-contains("capture")) %>%
  mutate(baseline = 1) %>%
  rename(lift = cumulative_lift) %>%
  pivot_longer(cols = c(lift, baseline), values_to = "value", names_to = "key")

lift_transformed_tbl %>%
  ggplot(aes(x = cumulative_data_fraction, y = value, color = key)) +
  geom_line(size = 0.5) +
  scale_color_gdocs() +
  theme_minimal() +
  theme_new +
  labs(title = "Lift Chart",
       x = "Cumulative Data Fraction",
       y = "Lift")

Step 9: Develop an H2O Model Metrics dashboard with ‘cowplot’ package

plot_h2o_performance <- function(h2o_leaderboard, newdata, order_by = c("auc", "logloss"),
                                 top_models = 2, bottom_models = 2, size = 1.5) {
  
  # Inputs
  leaderboard_tbl <- h2o_leaderboard %>%
    as_tibble() %>%
    slice(1:top_models,(n()-bottom_models+1):n())
  
  newdata_tbl <- newdata %>%
    as_tibble()
  
  # Selecting the first, if nothing is provided
  order_by      <- tolower(order_by[[1]]) 
  
  # Convert string stored in a variable to column name (symbol)
  order_by_expr <- rlang::sym(order_by)
  
  # Turn of the progress bars ( opposite h2o.show_progress())
  h2o.no_progress()
  
  # 1. Model Metrics
  get_model_performance_metrics <- function(model_id, test_tbl) {
    
    model_h2o <- h2o.getModel(model_id)
    perf_h2o  <- h2o.performance(model_h2o, newdata = as.h2o(test_tbl))
    
    perf_h2o %>%
      h2o.metric() %>%
      as.tibble() %>%
      select(threshold, tpr, fpr, precision, recall)
  }
  
  model_metrics_tbl <- leaderboard_tbl %>%
    mutate(metrics = map(model_id, get_model_performance_metrics, newdata_tbl)) %>%
    unnest(cols = metrics) %>%
    mutate(model_id = as_factor(model_id) %>%
             # programmatically reorder factors depending on order_by
             fct_reorder(!! order_by_expr, 
                         .desc = ifelse(order_by == "auc", TRUE, FALSE)),
           auc      = auc %>% 
             round(3) %>% 
             as.character() %>% 
             as_factor() %>% 
             fct_reorder(as.numeric(model_id)),
           logloss  = logloss %>% 
             round(4) %>% 
             as.character() %>% 
             as_factor() %>% 
             fct_reorder(as.numeric(model_id)))
  
  ## 1A. ROC Plot
  p1 <- model_metrics_tbl %>%
    ggplot(aes(fpr, tpr, color = model_id, linetype = !! order_by_expr)) +
    geom_line(size = size) +
    scale_color_gdocs() +
    theme_minimal() +
    theme_new +
    labs(title = "ROC", x = "FPR", y = "TPR") +
    theme(legend.direction = "vertical") 
  
  ## 1B. Precision vs Recall
  p2 <- model_metrics_tbl %>%
    ggplot(aes(recall, precision, color = model_id, linetype = !! order_by_expr)) +
    geom_line(size = size) +
    scale_color_gdocs() +
    theme_minimal() +
    theme_new +
    labs(title = "Precision Vs Recall", x = "Recall", y = "Precision") +
    theme(legend.position = "none") 
  
  ## 2. Gain / Lift
  get_gain_lift <- function(model_id, test_tbl) {
    
    model_h2o <- h2o.getModel(model_id)
    perf_h2o  <- h2o.performance(model_h2o, newdata = as.h2o(test_tbl)) 
    
    perf_h2o %>%
      h2o.gainsLift() %>%
      as.tibble() %>%
      select(group, cumulative_data_fraction, cumulative_capture_rate, cumulative_lift)
  }
  
  gain_lift_tbl <- leaderboard_tbl %>%
    mutate(metrics = map(model_id, get_gain_lift, newdata_tbl)) %>%
    unnest(cols = metrics) %>%
    mutate(model_id = as_factor(model_id) %>% 
             fct_reorder(!! order_by_expr, 
                         .desc = ifelse(order_by == "auc", TRUE, FALSE)),
           auc  = auc %>% 
             round(3) %>% 
             as.character() %>% 
             as_factor() %>% 
             fct_reorder(as.numeric(model_id)),
           logloss = logloss %>% 
             round(4) %>% 
             as.character() %>% 
             as_factor() %>% 
             fct_reorder(as.numeric(model_id))) %>%
    rename(gain = cumulative_capture_rate,
           lift = cumulative_lift) 
  
  ## 2A. Gain Plot
  p3 <- gain_lift_tbl %>%
    ggplot(aes(cumulative_data_fraction, gain, 
               color = model_id, linetype = !! order_by_expr)) +
    geom_line(size = size,) +
    geom_segment(x = 0, y = 0, xend = 1, yend = 1, 
                 color = "red", size = size, linetype = "dotted") +
    scale_color_gdocs() +
    theme_minimal() +
    theme_new +
    expand_limits(x = c(0, 1), y = c(0, 1)) +
    labs(title = "Gain", x = "Cumulative Data Fraction", y = "Gain") +
    theme(legend.position = "none")
  
  ## 2B. Lift Plot
  p4 <- gain_lift_tbl %>%
    ggplot(aes(cumulative_data_fraction, lift, 
               color = model_id, linetype = !! order_by_expr)) +
    geom_line(size = size) +
    geom_segment(x = 0, y = 1, xend = 1, yend = 1, 
                 color = "red", size = size, linetype = "dotted") +
    scale_color_gdocs() +
    theme_minimal() +
    theme_new +
    expand_limits(x = c(0, 1), y = c(0, 1)) +
    labs(title = "Lift", x = "Cumulative Data Fraction", y = "Lift") +
    theme(legend.position = "none")
  
  ### Combine using cowplot
  # cowplot::get_legend extracts a legend from a ggplot object
  p_legend <- get_legend(p1)
  
  # Remove legend from p1
  p1 <- p1 + theme(legend.position = "none")
  
  # cowplot::plt_grid() combines multiple ggplots into a single cowplot object
  p <- cowplot::plot_grid(p1, p2, p3, p4, ncol = 2)
  
  # cowplot::ggdraw() sets up a drawing layer
  p_title <- ggdraw() + 
    
    # cowplot::draw_label() draws text on a ggdraw layer / ggplot object
    draw_label(glue("Metrics for Top {top_models} & Bottom {bottom_models} H2O Models"), 
               size = 18, fontface = "bold", color = "#2C3E50")
  
  p_subtitle <- ggdraw() + 
    draw_label(glue("Ordered by {toupper(order_by)}"), 
               size = 10, color = "#2C3E50")
  
  # Combine everything
  ret <- plot_grid(p_title, p_subtitle, p, p_legend, 
                   # Adjust the relative spacing, so that the legends always fits
                   ncol = 1, rel_heights = c(0.05, 0.05, 1, 0.05 * (top_models + bottom_models)))
  
  h2o.show_progress()
  
  return(ret)
}

automl_models_h2o@leaderboard %>%
  plot_h2o_performance(newdata = test_tbl, order_by = "logloss", 
                       size = 0.75, bottom_models = 5, top_models = 5)